Experimental Assessment of Parallel Systems

نویسندگان

  • João Gabriel Silva
  • Joao Carreira
  • Henrique Madeira
  • Diamantino Costa
  • Francisco Moreira
چکیده

In the research reported in this paper, transient faults were injected in the nodes and in the communication subsystem (by using software fault injection) of a commercial parallel machine running several real applications. The results showed that a significant percentage of faults caused the system to produce wrong results while the application seemed to terminate normally, thus demonstrating that fault tolerance techniques are required in parallel systems, not only to assure that long-running applications can terminate but also (and more important) that the results produced are correct. Of the techniques tested to reduce the percentage of undetected wrong results only ABFT proved to be effective. For other simple error detection methods to be effective, they have to be designed in, and not added as an after thought. Faults injected in the communication subsystem proved the effectiveness of end-to-end CRCs on the data movements between processors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel computation framework for optimizing trailer routes in bulk transportation

We consider a rich tanker trailer routing problem with stochastic transit times for chemicals and liquid bulk orders. A typical route of the tanker trailer comprises of sourcing a cleaned and prepped trailer from a pre-wash location, pickup and delivery of chemical orders, cleaning the tanker trailer at a post-wash location after order delivery and prepping for the next order. Unlike traditiona...

متن کامل

Ageing Orders of Series-Parallel and Parallel-Series Systems with Independent Subsystems Consisting of Dependent Components

In this paper, we consider series-parallel and parallel-series systems with independent subsystems consisting of dependent homogeneous components whose joint lifetimes are modeled by an Archimedean copula. Then, by considering two such systems with different numbers of components within each subsystem, we establish hazard rate and reversed hazard rate orderings between the two system lifetimes,...

متن کامل

A Hybrid Unconscious Search Algorithm for Mixed-model Assembly Line Balancing Problem with SDST, Parallel Workstation and Learning Effect

Due to the variety of products, simultaneous production of different models has an important role in production systems. Moreover, considering the realistic constraints in designing production lines attracted a lot of attentions in recent researches. Since the assembly line balancing problem is NP-hard, efficient methods are needed to solve this kind of problems. In this study, a new hybrid met...

متن کامل

Preservation of Stochastic Orderings of Interdependent Series and Parallel Systems by Componentwise Switching to Exponentiated Models

This paper discusses the preservation of some stochastic orders between two interdependent series and parallel systems when the survival and distribution functions of all components switch to the exponentiated model. For the series systems, the likelihood ratio, hazard rate, usual, aging faster, aging intensity, convex transform, star, superadditive and dispersive orderings, and for the paralle...

متن کامل

Stochastic Comparisons of Series and Parallel Systems with Heterogeneous Extended Generalized Exponential Components

In this paper, we discuss the usual stochastic‎, ‎likelihood ratio, ‎dispersive and convex transform order between two parallel systems with independent heterogeneous extended generalized exponential components. ‎We also establish the usual stochastic order between series systems from two independent heterogeneous extended generalized exponential samples. ‎Finally, ‎we f...

متن کامل

A Multi Objective Optimization Model for Redundancy Allocation Problems in Series-Parallel Systems with Repairable Components

The main goal in this paper is to propose an optimization model for determining the structure of a series-parallel system. Regarding the previous studies in series-parallel systems, the main contribution of this study is to expand the redundancy allocation parallel to systems that have repairable components. The considered optimization model has two objectives: maximizing the system mean time t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996